Skip to content

feat(sdk): add otel sink to generate and emit OTEL events#156

Open
namrataghadi-galileo wants to merge 1 commit intofeature/59787-merge-eventsfrom
feature/59793-otel-emission-for-merged-events
Open

feat(sdk): add otel sink to generate and emit OTEL events#156
namrataghadi-galileo wants to merge 1 commit intofeature/59787-merge-eventsfrom
feature/59793-otel-emission-for-merged-events

Conversation

@namrataghadi-galileo
Copy link
Copy Markdown
Contributor

Summary

  • Adds an optional OTEL emission path for merged control execution events in the Python SDK.
  • When OTEL is configured, the SDK registers an OTEL-backed ControlEventSink and uses the existing merged-event flow to emit reconstructed local and server ControlExecutionEvents as OTEL spans.
  • This keeps the default SDK/server observability behavior unchanged while enabling OTEL export for the merged-event path.
  • Also adds a new example that shows control creation, merged event reconstruction, OTEL export, and collector-style payload inspection end to end.

Scope

  • User-facing/API changes:
  • Adds SDK settings for OTEL event emission:
  • AGENT_CONTROL_OTEL_ENABLED
  • AGENT_CONTROL_OTEL_ENDPOINT
  • AGENT_CONTROL_OTEL_HEADERS
  • AGENT_CONTROL_OTEL_SERVICE_NAME
  • Adds exported SDK OTEL helpers:
  • create_otel_event_sink(...)
  • configure_otel_event_sink(...)
  • control_event_to_otel_span(...)
  • control_event_to_otel_attributes(...)
  • is_otel_event_emission_configured(...)
  • Adds a new runnable example under examples/otel_merged_events

Internal changes:

  • Adds sdks/python/src/agent_control/telemetry/otel.py
  • Registers the OTEL sink automatically during agent_control.init(...) when OTEL is configured and no other control event sink is already registered
  • Reuses the existing merged-event emission path instead of changing the default local/server event behavior
  • Maps reconstructed control-event fields and metadata into OTEL span attributes
  • Out of scope:
  • Galileo-side OTEL ingestion, normalization, or persistence
  • Any changes to default OSS event queueing/server ingestion behavior
  • New backend OTEL collector integration in this repo

Expected Behavior

  • Default behavior is unchanged:
  • local events continue through the existing SDK observability queue
  • server-side events continue to be built and ingested by the server
  • OTEL only kicks in when OTEL is configured in SDK settings/env vars and no other control event sink has already been registered.
  • In that case:
  • agent_control.init(...) auto-registers an OTEL ControlEventSink
  • evaluation switches into the existing merged-event mode
  • the SDK reconstructs local and server control execution events after evaluation
  • the SDK emits one merged batch through the OTEL sink
  • each merged event is converted into an OTEL span with AgentControl-specific attributes
  • If OTEL is not configured, or OTEL packages are not installed, this path stays inert and existing behavior remains unchanged.
  • Explicitly registered sinks still take precedence over OTEL auto-registration.

Risk and Rollout

  • Risk level: medium
  • Rollback plan:
  • Remove or disable OTEL sink auto-registration in init(...)
  • Unset OTEL-related SDK settings/env vars to fall back to existing behavior
  • Revert the new OTEL telemetry module and example if needed

Testing

  • Added or updated automated tests
  • Ran make check (or explained why not)
  • Manually verified behavior
  • Added example to demonstrate the behvior

Notes:

  • Added focused tests for OTEL configuration, attribute mapping, sink registration, and OTEL span emission behavior.
  • I did not run full make check here.
  • Manually verified the new OTEL merged-events example and iterated on its output to show:
  • control creation
  • local/server evaluation responses
  • merged event reconstruction
  • final collector-facing OTEL payloads

Checklist

  • Linked issue/spec (if applicable)
  • Updated docs/examples for user-facing changes
  • Included any required follow-up tasks

Suggested follow-up tasks:

  • Galileo-side OTEL ingestion support for AgentControl-specific OTEL spans

  • End-to-end CI smoke test for the OTEL merged-events example

  • Document the OTEL attribute contract formally if this path will be consumed by downstream ingestion systems

  • Included any required follow-up tasks

@codecov
Copy link
Copy Markdown

codecov bot commented Mar 31, 2026

Codecov Report

❌ Patch coverage is 77.98165% with 24 lines in your changes missing coverage. Please review.

Files with missing lines Patch % Lines
sdks/python/src/agent_control/telemetry/otel.py 77.45% 23 Missing ⚠️
sdks/python/src/agent_control/__init__.py 50.00% 1 Missing ⚠️

📢 Thoughts on this report? Let us know!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant